-
Notifications
You must be signed in to change notification settings - Fork 76
feat: add schema update to table metadata builder #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7836c3e to
62960f3
Compare
| const std::vector<int32_t>& schema_ids() const { return schema_ids_; } | ||
| const std::unordered_set<int32_t>& schema_ids() const { return schema_ids_; } | ||
|
|
||
| void ApplyTo(TableMetadataBuilder& builder) const override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about implement applyTo here? Although the table update doesn’t seem to be invoked in the current architecture and may need refactoring , I think it makes sense to implement it here.
| auto name = std::string(partition_field.name()); | ||
| if (name.empty()) { | ||
| return InvalidArgument("Cannot use empty partition name: {}", name); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto name = std::string(partition_field.name()); | |
| if (name.empty()) { | |
| return InvalidArgument("Cannot use empty partition name: {}", name); | |
| } | |
| ICEBERG_PRECHECK(!partition_field.name().empty(), "Cannot use empty partition name: {}", name); | |
| auto name = std::string(partition_field.name()); |
| #include "iceberg/util/macros.h" | ||
| #include "iceberg/util/type_util.h" | ||
| #include "iceberg/util/visit_type.h" | ||
| #include "table_metadata.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #include "table_metadata.h" | |
| #include "iceberg/table_metadata.h" |
| auto max_it = std::ranges::max_element( | ||
| id_to_field.get(), | ||
| [](const auto& lhs, const auto& rhs) { return lhs.first < rhs.first; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapper with a Lazy or a simple std::call_once? Avoid search every time.
| return max_it->first; | ||
| } | ||
|
|
||
| bool Schema::SameSchema(const Schema& other) const { return fields_ == other.fields_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need compare identifier_fields
| if (!schema_ids_to_remove.contains(current_schema_id)) { | ||
| return InvalidArgument("Cannot remove current schema: {}", current_schema_id); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!schema_ids_to_remove.contains(current_schema_id)) { | |
| return InvalidArgument("Cannot remove current schema: {}", current_schema_id); | |
| } | |
| ICEBERG_PRECHECK(!schema_ids_to_remove.contains(current_schema_id), "Cannot remove current schema: {}", current_schema_id); |
| std::ranges::find_if(changes_, [new_schema_id](const auto& change) { | ||
| if (change->kind() != TableUpdate::Kind::kAddSchema) { | ||
| return false; | ||
| } | ||
| auto* add_schema = dynamic_cast<table::AddSchema*>(change.get()); | ||
| return add_schema->schema()->schema_id().value_or(Schema::kInitialSchemaId) == | ||
| new_schema_id; | ||
| }) != changes_.cend(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::ranges::find_if(changes_, [new_schema_id](const auto& change) { | |
| if (change->kind() != TableUpdate::Kind::kAddSchema) { | |
| return false; | |
| } | |
| auto* add_schema = dynamic_cast<table::AddSchema*>(change.get()); | |
| return add_schema->schema()->schema_id().value_or(Schema::kInitialSchemaId) == | |
| new_schema_id; | |
| }) != changes_.cend(); | |
| std::ranges::any_of(changes_, [new_schema_id](const auto& change) { | |
| if (change->kind() != TableUpdate::Kind::kAddSchema) { | |
| return false; | |
| } | |
| auto* add_schema = dynamic_cast<table::AddSchema*>(change.get()); | |
| return add_schema->schema()->schema_id().value_or(Schema::kInitialSchemaId) == | |
| new_schema_id; | |
| }); |
| auto new_schema = std::make_shared<Schema>( | ||
| std::vector<SchemaField>(schema.fields().begin(), schema.fields().end()), | ||
| new_schema_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lack of identifier_field_ids
| auto new_schema = std::make_shared<Schema>( | |
| std::vector<SchemaField>(schema.fields().begin(), schema.fields().end()), | |
| new_schema_id); | |
| auto new_schema = std::make_shared<Schema>( | |
| schema.fields() | std::ranges::to<std::vector>(), | |
| new_schema_id, schema.IdentifierFieldIds()); |
| ICEBERG_RETURN_UNEXPECTED(schema.Validate(metadata_.format_version)); | ||
|
|
||
| auto new_schema_id = ReuseOrCreateNewSchemaId(schema); | ||
| if (schemas_by_id_.find(new_schema_id) != schemas_by_id_.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also check the metadata_.last_column_id equals to new_last_column_id in Java implement, does it matter?
| ICEBERG_BUILDER_ASSIGN_OR_RETURN(auto schema_id, | ||
| impl_->AddSchema(*schema, new_last_column_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ICEBERG_BUILDER_ASSIGN_OR_RETURN(auto schema_id, | |
| impl_->AddSchema(*schema, new_last_column_id)); | |
| ICEBERG_BUILDER_RETURN_IF_ERROR(impl_->AddSchema(*schema, new_last_column_id)); |
schema_id is not used
| static constexpr int64_t kInvalidSequenceNumber = -1; | ||
| static constexpr int64_t kInitialRowId = 0; | ||
|
|
||
| static inline const std::unordered_map<TypeId, int8_t> kMinFormatVersions = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not inited?
| Status TableMetadataBuilder::Impl::SetCurrentSchema(int32_t schema_id) { | ||
| if (schema_id == kLastAdded) { | ||
| if (!last_added_schema_id_.has_value()) { | ||
| return InvalidArgument("Cannot set last added schema: no schema has been added"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return InvalidArgument("Cannot set last added schema: no schema has been added"); | |
| return ValidationFailed("Cannot set last added schema: no schema has been added"); |
No description provided.